Performance measurement framework for hierarchical text classification
نویسندگان
چکیده
Hierarchical text classification or simply hierarchical classification refers to assigning a document to one or more suitable categories from a hierarchical category space. In our literature survey, we have found that the existing hierarchical classification experiments used a variety of measures to evaluate performance. These performance measures often assume independence between categories and do not consider documents misclassified into categories that are similar or not far from the correct categories in the category tree. In this paper, we therefore propose new performance measures for hierarchical classification. The proposed performance measures consist of category similarity measures and distance-based measures that consider the contributions of misclassified documents. Our experiments on hierarchical classification methods based on SVM classifiers and binary Naı̈ve Bayes classifiers showed that SVM classifiers perform better than Naı̈ve Bayes classifiers on Reuters-21578 collection according to the extended measures. A new classifier-centric measure called blocking measure is also defined to examine the performance of subtree classifiers in a top-down levelbased hierarchical classification method.
منابع مشابه
خوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملActive Learning for Hierarchical Text Classification
Hierarchical text classification plays an important role in many real-world applications, such as webpage topic classification, product categorization and user feedback classification. Usually a large number of training examples are needed to build an accurate hierarchical classification system. Active learning has been shown to reduce the training examples significantly, but it has not been ap...
متن کاملMulti-Dimensional Text Classification
This paper proposes a multi-dimensional framework for classifying text documents. In this framework, the concept of multidimensional category model is introduced for representing classes. In contrast with traditional flat and hierarchical category models, the multi-dimensional category model classifies each text document in a collection using multiple predefined sets of categories, where each s...
متن کاملMulti-label Hierarchical Text Classification using the ACM Taxonomy
Many of the works of text classification involve the attribution of each text a single class label from a predefined set of classes, usually small and flat organized (flat classification). However, there are more complex classification problems in which we can assign to each text more than one class (multi-label classification), that can be organized in a hierarchical structure (hierarchical cl...
متن کاملHierarchical Text Classification with Latent Concepts
Recently, hierarchical text classification has become an active research topic. The essential idea is that the descendant classes can share the information of the ancestor classes in a predefined taxonomy. In this paper, we claim that each class has several latent concepts and its subclasses share information with these different concepts respectively. Then, we propose a variant Passive-Aggress...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JASIST
دوره 54 شماره
صفحات -
تاریخ انتشار 2003